Abstract
Background: Venous thromboembolism (VTE) is a potentially preventable complication in cancer patients undergoing systemic therapy. Randomized controlled trials have shown that administration of prophylactic anticoagulants may mitigate this risk. As such, real-time and dynamic identification of patients at risk of VTE at any given point in time is critical. Artificial intelligence/machine learning (AI/ML) methods that can integrate a large amount of complex information that accrues over the patient's history are well-suited to this task.
Methods: Data for this study were derived from the Corporate Data Warehouse (CDW) of the Veterans Affairs (VA) healthcare system in the US. We used all cancer patients within the VA Cancer Registry from 2006 to 2022 and assigned 80,808 patients with systemic treatment initiation dates from 2011-20 to the primary cohort, and 3,303 from 2021-22 as the temporal validation cohort. Patients were further excluded if they were not primary users of the VA healthcare system, received prior anticoagulation, or had recent VTE diagnoses within 6 months of systemic treatment.
We developed a novel transformer-based AI/ML model with event-feature embedding and multi-headed attention layers that predicts future VTE within one year of systemic treatment. Longitudinal patient trajectories comprised of time-stamped diagnostic codes (phecodes) and laboratory records from the patient's medical history were constructed to predict VTE risk. Input trajectories comprised of 1,862 distinct phecodes, which are synoptic concepts derived from diagnostic ICD codes and 18 laboratory tests derived from the complete blood cell count and metabolic panels. Sex, race, and BMI were included as static covariates. Cancer diagnosis dates and treatment index dates were used as anchor times to account for the relative positional encoding of events with respect to these points in the patient's history.
The transformer-based AI/ML model used four distinct input trajectories of phecodes and labs to predict VTE risk in four consecutive quarterly look-forward windows following the initiation of systemic treatment. For each patient, input trajectories started at 3 months before the index date and ended at 0, 3, 6, and 9 months after treatment initiation. Corresponding look-forward prediction windows were constructed at (Q1) 0-3, (Q2) 3-6, (Q3) 6-9, and (Q4) 9-12 months beyond the treatment index date. If a patient experienced VTE at a given quarter, they were considered a positive case for that quarter and excluded from subsequent prediction intervals.
Patients in the primary cohort were split into training, development, and test sets in a ratio of 60:20:20. The transformer was developed using the training and development sets with a logistic loss function and balanced sampling. Model performance was assessed using area under the receiver operating characteristic curve (AUC).
Results: Of the 80,808 patients from the primary cohort, 4,873 (6%) cases of VTE occurred within one year of treatment index date. The incidence of VTE decreased over time, with 3.2% of cases occurring in the first quarter (Q1) after treatment index date, 1.7% in Q2, 1.1% in Q3, and 0.9% in Q4. Test set predictions yielded AUCs of .68, .68, .71, and .77 respectively for Q1-Q4 with recall (sensitivity) rates of .82, .81, .78, and .84 after spline recalibration. Performance was largely consistent across subgroups defined by cancer type, treatment type, and demographics. AUCs were .65, .60, .76, .76 from Q1-Q4 in the temporal validation cohort. Compared to existing clinical risk prediction tools, specifically the widely used Khorana score (Khorana et al., 2005) and EHR-CAT (Li et al., 2023), the transformer model performed similarly in Q1-Q2 but significantly increased identification of high-risk patients up to 50% at Q3-Q4 (6-12 months) past treatment initiation.
Conclusion: By leveraging disease histories and lab records data from the comprehensive EHR from the VA CDW, our proposed transformer-based AI/ML model is an effective method to predict VTE risk using disease and lab trajectories. This novel AI/ML model is uniquely suited for dynamic risk assessment with the ability to incorporate pertinent time-dependent information along the entire treatment trajectory.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal